The Labeled Segmentation of Printed Books
نویسندگان
چکیده
We introduce the task of book structure labeling: segmenting and assigning a fixed category (such as TABLE OF CONTENTS, PREFACE, INDEX) to the document structure of printed books. We manually annotate the page-level structural categories for a large dataset totaling 294,816 pages in 1,055 books evenly sampled from 1750– 1922, and present empirical results comparing the performance of several classes of models. The best-performing model, a bidirectional LSTM with rich features, achieves an overall accuracy of 95.8 and a class-balanced macro F-score of 71.4.
منابع مشابه
A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملThe Comparative Effects of Using Electronic Short Story Books and Tradi-tional Printed Texts on EFL Learners’ Reading Comprehension
The purpose of this study was to investigate the comparative effect of using electronic short story books and traditional printed texts on EFL learners’ reading comprehension. For that purpose, ninety female learners ranging in age between fifteen and thirty five sat for the language proficiency test (PET, 2009) as the test of homogeneity and consequently sixty students were selected based on t...
متن کاملارزیابی سطح خوانایی کتابهای داستانی تألیفی برگزیدۀ شورای کتاب کودک
Purpose: This research aimed at the investigation of the readability level of 100 prominent authored fiction books for B, C, and D age groups, selected by Children's Book Council of Iran as an official institute for labeling and assigning level of children’s books in Iran. Methodology: Evaluative research method was used for the implementation of this research. Research population consisted ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017